Gene and repetitive sequence annotation in the Triticeae

نویسندگان

  • Thomas Wicker
  • Robin Buell
چکیده

The Triticeae tribe contains some of the world's most important agricultural crops (wheat, barley and rye) and is perhaps, one of the most challenging for genome annotation because Triticeae genomes are primarily composed of repetitive sequences. Further complicating the challenge is the polyploidy found in wheat and particularly in the hexaploid bread wheat genome. Genomic sequence data are available for the Triticeae in the form of large collections of Expressed Sequence Tags (>1.5 million) and an increasing number of bacterial artificial chromosome clone sequences. Given that high repetitive sequence content in the Triticeae confounds annotation of protein-coding genes, repetitive sequences have been identified, annotated, and collated into public databases. Protein coding genes in the Triticeae are structurally annotated using a combination of ab initio gene finders and experimental evidence. Functional annotation of protein coding genes involves assessment of sequence similarity to known proteins, expression evidence, and the presence of domain and motifs. Annotation methods and tools for Triticeae genomic sequences have been adapted from existing plant genome annotation projects and were designed to allow for flexibility of single sequence annotation while allowing a whole community annotation effort to be developed. With the availability of an increasing number of annotated grass genomes, comparative genomics can be exploited to accelerate and enhance the quality of Triticeae sequences annotation. This chapter provides a brief overview of the Triticeae genomes features that are challenging for genome annotation and describes the resources and methods available for sequence annotation with a particular emphasis on problems caused by the repetitive fraction of these genomes. Gene and Repetitive Sequence Annotation in the Triticeae Thomas Wicker and C. Robin Buell 1 Thomas Wicker, Institute of Plant Biology, University Zurich, Zollikerstrasse 107, CH-8008 Zurich; Email: [email protected] 2 C. Robin Buell, Department of Plant Biology, Michigan State University, East Lansing MI 48824 USA; Email: [email protected] Abstract. The Triticeae tribe contains some of the world’s most important agricultural crops (wheat, barley and rye) and is perhaps, one of the most challenging for genome annotation because Trticeae genomes are primarily composed of repetitive sequences. Further complicating the challenge is the polyploidy found in wheat and particularly in the hexaploid bread wheat genome. Genomic sequence data are available for the Triticeae in the form of large collections (>1 million) of Expressed Sequence Tags and an increasing number of bacterial artificial chromosome clone sequences. Given that high repetitive sequence content in the Triticeae confounds annotation of proteincoding genes, repetitive sequences have been identified, annotated, and collated into public databases. Protein coding genes in the Triticeae are structurally annotated using a combination of ab initio gene finders and experimental evidence. Functional annotation of protein coding genes involves assessment of sequence similarity to known proteins, expression evidence, and the presence of domain and motifs. Annotation methods and tools for Triticeae genomic sequences have been adapted from existing plant genome annotation projects and were designed to allow for flexibility of single sequence annotation while allowing a whole community annotation effort to be developed. With the availability of an increasing number of annotated grass genomes, comparative genomics can be exploited to accelerate and enhance the The Triticeae tribe contains some of the world’s most important agricultural crops (wheat, barley and rye) and is perhaps, one of the most challenging for genome annotation because Trticeae genomes are primarily composed of repetitive sequences. Further complicating the challenge is the polyploidy found in wheat and particularly in the hexaploid bread wheat genome. Genomic sequence data are available for the Triticeae in the form of large collections (>1 million) of Expressed Sequence Tags and an increasing number of bacterial artificial chromosome clone sequences. Given that high repetitive sequence content in the Triticeae confounds annotation of proteincoding genes, repetitive sequences have been identified, annotated, and collated into public databases. Protein coding genes in the Triticeae are structurally annotated using a combination of ab initio gene finders and experimental evidence. Functional annotation of protein coding genes involves assessment of sequence similarity to known proteins, expression evidence, and the presence of domain and motifs. Annotation methods and tools for Triticeae genomic sequences have been adapted from existing plant genome annotation projects and were designed to allow for flexibility of single sequence annotation while allowing a whole community annotation effort to be developed. With the availability of an increasing number of annotated grass genomes, comparative genomics can be exploited to accelerate and enhance the

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development and annotation of perennial Triticeae ESTs and SSR markers.

Triticeae contains hundreds of species of both annual and perennial types. Although substantial genomic tools are available for annual Triticeae cereals such as wheat and barley, the perennial Triticeae lack sufficient genomic resources for genetic mapping or diversity research. To increase the amount of sequence information available in the perennial Triticeae, three expressed sequence tag (ES...

متن کامل

Comparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species

Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...

متن کامل

Revolver and Superior: Novel Transposon-Like Gene Families of the Plant Kingdom

High-throughput sequencing of eukaryotic genomes has revived interest in the structure and function of repetitive genomic sequences, previously referred to as junk DNA. Repetitive sequences, including transposable elements, are now believed to play a significant role in genomic differentiation and evolution. Some are also expressed as regulatory noncoding RNAs. Vast DNA databases exist for high...

متن کامل

Triticeae Resources in Ensembl Plants

Recent developments in DNA sequencing have enabled the large and complex genomes of many crop species to be determined for the first time, even those previously intractable due to their polyploid nature. Indeed, over the course of the last 2 years, the genome sequences of several commercially important cereals, notably barley and bread wheat, have become available, as well as those of related w...

متن کامل

Global Landscape of a Co-Expressed Gene Network in Barley and its Application to Gene Discovery in Triticeae Crops

Accumulated transcriptome data can be used to investigate regulatory networks of genes involved in various biological systems. Co-expression analysis data sets generated from comprehensively collected transcriptome data sets now represent efficient resources that are capable of facilitating the discovery of genes with closely correlated expression patterns. In order to construct a co-expression...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010